feat: add local whisper.cpp voice transcription provider by thereisnotime · Pull Request #157 · RichardAtCT/claude-code-telegram

thereisnotime · 2026-03-20T00:30:54Z

Summary

Adds a third voice transcription provider (VOICE_PROVIDER=local) that uses whisper.cpp and ffmpeg for fully offline, API-key-free voice message transcription
New settings: WHISPER_CPP_BINARY_PATH and WHISPER_CPP_MODEL_PATH for configuring the local binary and model
Dedicated setup guide at docs/local-whisper-cpp.md with build-from-source instructions, model download links, and troubleshooting tips

Changes

src/bot/features/voice_handler.py — new _transcribe_local() pipeline: OGG→WAV (ffmpeg) → whisper.cpp binary
src/config/settings.py — whisper_cpp_binary_path, whisper_cpp_model_path fields + resolver properties
src/config/features.py — local provider skips API key check
src/bot/features/registry.py — updated key-availability logic
src/bot/handlers/message.py / src/bot/orchestrator.py — provider-aware error messages
docs/local-whisper-cpp.md — full build & setup guide
.env.example, CLAUDE.md, README.md, docs/configuration.md — documentation updates
Tests — full coverage for local provider (ffmpeg, binary, model, empty output, non-zero exit)

Test plan

Run existing test suite (pytest) — all tests should pass
Verify VOICE_PROVIDER=local with whisper.cpp installed transcribes a real voice message
Verify clear error messages when ffmpeg / whisper.cpp binary / model file is missing
Verify VOICE_PROVIDER=mistral and VOICE_PROVIDER=openai still work unchanged

🤖 Generated with Claude Code

thereisnotime · 2026-03-20T00:38:57Z

Hey @RichardAtCT 👋 — would appreciate a review when you get a chance! This adds a local whisper.cpp voice transcription provider (no API keys needed).

FridayOpenClawBot · 2026-03-20T06:26:10Z

PR Review
Reviewed head: affa44f2a351a86e7bb4e3834cc8b6504b6299e0

Summary

Adds a third voice transcription provider (VOICE_PROVIDER=local) backed by whisper.cpp + ffmpeg — fully offline, no API key required
New settings WHISPER_CPP_BINARY_PATH / WHISPER_CPP_MODEL_PATH with sensible defaults and named-model resolution to ~/.cache/whisper-cpp/ggml-{name}.bin
Full unit test coverage for all error paths (ffmpeg missing, binary missing, model missing, empty output, non-zero exit)

What looks good

Clean provider abstraction — _transcribe_local is well-isolated and the existing Mistral/OpenAI paths are untouched
Tempfile cleanup in a finally block is correct; no risk of leaking WAV files even on failure
Error messages are actionable (include install commands and env var names) — good UX for a self-hosted setup

Issues / questions

[Important] src/bot/features/voice_handler.py — Neither _convert_ogg_to_wav nor _run_whisper_cpp has a timeout. process.communicate() will block indefinitely if ffmpeg or whisper.cpp stalls. A near-20 MB file on a slow machine (or a model file that takes a long time to load the first time) could tie up the bot until the process exits or is killed externally. Consider asyncio.wait_for(process.communicate(), timeout=120) (or whatever the existing GIT_OPERATIONS_TIMEOUT pattern uses), raising a RuntimeError("transcription timed out") on expiry so the user gets feedback.
[Nit] src/bot/features/voice_handler.py — _resolve_whisper_binary validates via shutil.which(binary) but returns the original unresolved string (binary), discarding the fully-qualified path (resolved). This is fine for subprocess dispatch since PATH lookup happens again at exec time, but it means the validated path isn't reused — if PATH somehow changes between validation and execution, the nice error message is bypassed and you'd get a raw FileNotFoundError. Returning resolved from the method would make validation and execution consistent.

Verdict
⚠️ Merge after fixes — timeout on subprocess calls is the main gap; everything else is solid.

— Friday, AI assistant to @RichardAtCT

RichardAtCT · 2026-03-20T06:28:33Z

Hey @RichardAtCT 👋 — would appreciate a review when you get a chance! This adds a local whisper.cpp voice transcription provider (no API keys needed).

Thanks - great idea. I actually use local whisper everywhere else so this makes sense!

Can you please fix the timeout flagged by @FridayOpenClawBot and the failing lint and then it is good to merge

Add a third voice provider option (VOICE_PROVIDER=local) that transcribes Telegram voice messages entirely offline using whisper.cpp and ffmpeg. No API keys or cloud services required. - New local provider in voice_handler.py (OGG->WAV via ffmpeg, then whisper.cpp) - Settings: WHISPER_CPP_BINARY_PATH, WHISPER_CPP_MODEL_PATH - Feature flag, registry, and error messages updated for local provider - Dedicated build/setup guide at docs/local-whisper-cpp.md - Full test coverage for the local provider path - Updated .env.example, CLAUDE.md, README.md, docs/configuration.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

thereisnotime · 2026-03-20T08:59:10Z

Thanks for the review @RichardAtCT and @FridayOpenClawBot! Both issues have been addressed:

Timeouts — Added asyncio.wait_for(..., timeout=120) to both _convert_ogg_to_wav and _run_whisper_cpp. On timeout the subprocess is killed and a clear RuntimeError is raised.
Resolved binary path — _resolve_whisper_binary now caches and returns the fully-qualified path from shutil.which() so validation and execution are consistent.
Lint — Ran black + isort on all affected files.

Also added docs/setup.md updates with the local provider configuration example and a link to the full build guide.

RichardAtCT · 2026-03-27T00:09:55Z

Good feature addition — offline transcription is genuinely useful and the architecture fits cleanly into the existing provider pattern. Several issues need addressing before merge.

🐛 Critical: No subprocess timeouts

Both ffmpeg and whisper.cpp are awaited with no timeout. A hung process (corrupted audio, slow disk, large model) will block the asyncio event loop indefinitely. Add asyncio.wait_for:

try:
    _, ffmpeg_stderr = await asyncio.wait_for(
        ffmpeg_proc.communicate(), timeout=30.0
    )
except asyncio.TimeoutError:
    ffmpeg_proc.kill()
    raise RuntimeError("ffmpeg timed out after 30s")

Same for the whisper subprocess. Timeout values should be configurable via settings (e.g. whisper_cpp_timeout: int = 120).

🔒 Minor: Temp file path construction

wav_path = ogg_path.replace(".ogg", ".wav") is fragile. Use:

wav_path = Path(ogg_path).with_suffix(".wav")

Also: WHISPER_CPP_BINARY_PATH is passed directly to create_subprocess_exec. No shell injection risk since create_subprocess_exec doesn't invoke a shell — but worth documenting explicitly. Consider validating the resolved path is executable at startup.

⚠️ Misconfiguration errors surface at request time, not startup

whisper_cpp_binary_path_resolved and whisper_cpp_model_path_resolved are computed on every request. A user who misconfigures the binary path won't find out until they send a voice message. Add a validate_local_provider() method called at bot startup (alongside existing provider validation) that calls both properties once and catches ValueError. Much better UX.

🔤 Type annotations

shutil is imported inside the property body — move to module-level imports per isort requirements
Field(None, ...) — the ... as second positional arg is unusual; prefer Field(default=None, description="...") for clarity and mypy friendliness
If whisper_cpp_timeout is added as recommended, ensure it's typed

🧪 Test coverage

Confirm tests cover:

ffmpeg failure (non-zero return code)
whisper.cpp failure
Empty transcription result
Temp file cleanup after each failure path
Timeout scenario (once timeout handling is added — this is a must-have test before merge)

Minor

Log the resolved binary/model path at INFO level on first use (structlog) — helps ops debugging
docs/local-whisper-cpp.md should note the ffmpeg system dependency explicitly

Summary: Clean implementation that fits the provider pattern well. The timeout issue is the main blocker — a hung whisper.cpp process will freeze the event loop in production. Everything else is polish.

— Friday, AI assistant to @RichardAtCT (posted as @RichardAtCT — FridayOpenClawBot access pending)

thereisnotime force-pushed the feat/local-whisper-cpp-provider branch from 9524828 to affa44f Compare March 20, 2026 00:38

thereisnotime force-pushed the feat/local-whisper-cpp-provider branch from affa44f to 5501304 Compare March 20, 2026 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add local whisper.cpp voice transcription provider#157

feat: add local whisper.cpp voice transcription provider#157
thereisnotime wants to merge 1 commit intoRichardAtCT:mainfrom
thereisnotime:feat/local-whisper-cpp-provider

thereisnotime commented Mar 20, 2026

Uh oh!

thereisnotime commented Mar 20, 2026

Uh oh!

FridayOpenClawBot commented Mar 20, 2026

Uh oh!

RichardAtCT commented Mar 20, 2026

Uh oh!

thereisnotime commented Mar 20, 2026

Uh oh!

RichardAtCT commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thereisnotime commented Mar 20, 2026

Summary

Changes

Test plan

Uh oh!

thereisnotime commented Mar 20, 2026

Uh oh!

FridayOpenClawBot commented Mar 20, 2026

Uh oh!

RichardAtCT commented Mar 20, 2026

Uh oh!

thereisnotime commented Mar 20, 2026

Uh oh!

RichardAtCT commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants